











DivX MPEG-4 Codec and Its Interface



June 24, 2001













Version
Key Author
Date Prepared
Reviewed by
Review Date
1.0 - Initial Draft
Adam Li
6/24/2001
Eugene Kuznetsov
5/15/2001
1.1 - Decore Operation
Andrea Graziani
7/3/2001
Eugene Kuznetsov
7/10/2001












TABLE OF CONTENTS
1.	INTRODUCTION	3
2.	INTERFACE THROUGH THE WINDOWS INSTALLABLE DRIVER	4
3.	INTERFACE THROUGH THE CODEC CORE INTERFACE - OPERATION OVERVIEW	5
4.	CODEC CORE INTERFACE PROTOTYPE	6
5.	ENCORE2 OPERATION	7
5.1.	ENCORE - INITIALIZATION	7
5.2.	ENCORE - ENCODING	8
5.3.	ENCORE - RELEASE	9
6.	DECORE2 OPERATION	10
6.1.	DECORE - INITIALIZATION	10
6.2.	DECORE  - DECODING	11
6.3.	DECORE - RELEASE	12
6.4.	DECORE - DECODING DIVX ;-) 3.11	13
6.5.	DECORE - EXTRA SETTINGS	13
6.6.	DECORE - EXAMPLE	13

1. Introduction 

The center of DivX MPEG-4 Codec is the DivX codec core. It is the engine of the codec. The codec core processes either the video image or MPEG-4 bitstream, and it uses the compression and decompression to convert information between the formats. The codec core includes two parts - an encoder (that compresses the video image into MPEG-4 bitstreams) and a decoder (that decompresses the MPEG-4 bitstream back into video images). The encoder core is named "encore", and the decoder core "decore".

This document describes the interface of DivX MPEG-4 Codec.

2. Codec structure

On x86 Linux codec consists of two shared libraries, named libdivxdecore.so.0.0.0 and libdivxencore.so.0.0.0. They represent decoder and encoder. You can link against them ( using -ldivxdecore -ldivxencore compiler flags ), include header files decore.h and encore2.h and call interface functions as if they were defined somewhere else in your code.

3. Interface Through the Codec Core Interface - Operation Overview

Before we go into the detail of the interface through the codec core functions, first let us go over a brief overview on the operational process of the codec core.

The encoding by the codec will have the following process:

For single-pass encoding:

1. Encore() is called to initialize a new instance and its coding parameters, references and other necessary information.
2. Encore() is called once for each frame to be encoded. The input will be the video frame to the codec and its coding parameter. The output will be the compressed MPEG-4 bitstream.
3. After all the video frames are completed encore() is called one more time to end the instance and release all the resources allocated for it.

For two-pass encoding:

Single-pass encoding, as described above, will be executed twice. In the first pass, the codec will measure and record the complexity of the video (without actually writing the  bitstream). The result is a log file that is created from the analyzed stream   which is used to determine the best parameters for each frame of encoding (currently this is done outside of encore). In the second pass, the codec will encode the video accordingly and output the actual MPEG-4 bitstream.

The decoding by the codec will have the following process:

1. Decore() is called to initialize a new instance and its coding parameters, references and other necessary information.
2. Decore() is called once for each frame to be encoded. The input will be the compressed MPEG-4 bitstream.The output will be the decoded video frame. 
3. After the entire bitstream is completeddecore() is called one more time to end the instance and release all the resources allocated for it.

4. Codec Core Interface Prototype

The interface has the following prototype:

// the prototype of the encore() - main encode engine entrance
int encore(
	void *handle,		    // handle	- the handle of the calling entity
	unsigned long enc_opt,  // enc_opt	- the option for encoding, see below
	void *param1,		    // param1	- the pointer to input parameter structure
	void *param2		    // param2	- the pointer to output parameter structure 
);

// the prototype of the decore() - main encode engine entrance
int decore(
	void *handle,		    // handle	- the handle of the calling entity
	unsigned long dec_opt,  // enc_opt	- the option for decoding, see below
	void *param1,		    // param1	- the pointer to input parameter structure
	void *param2		    // param2	- the pointer to output parameter structure
);

handle is a unique unsigned long interger assigned by the codec core for each encore/decore instance.  The codec core will remember the corresponding coding parameters and reference pictures for each unique handle. For decore it can be any nonzero value as long as it is unique. The calling application/thread that calls the codec core will provide a handle so the core knows which instance reference this  following operation is associated with. For encore the handle is created and initialized by encoder itself.

enc_opt / dec_opt are the parameters to instruct the core on the operations it needs to perform. param1 / param2 are two parameters whose meaning depending on the operations. Usually, param1 passes a pointer to the parameter structure that inputs to the codec, and param2 passes a pointer to the parameter structure that outputs from the codec.

5. Encore2 Operation

Encore has the following encoding options.

// encore options (the enc_opt parameter of encore())
#define ENC_OPT_INIT	    0	 // initialize the encoder, return a handle
#define ENC_OPT_RELEASE	    1   // release all the resource associated with the handle
#define ENC_OPT_ENCODE	    2	 // encode a single frame in one-pass mode
#define ENC_OPT_ENCODE_VBR	    3	 // encode a single frame in two-pass mode

Encore can return the following values:

// return code of encore()
#define ENC_FAIL		   -1
#define ENC_OK		   0
#define ENC_MEMORY		   1
#define ENC_BAD_FORMAT	   2

5.1. Encore - Initialization

The encore initialization process starts when ENC_OPT_INIT is set at enc_opt. Encore will initialize a new instance associated with handle.

When ENC_OPT_INIT is set, the calling thread needs to provide param1 pointing to the following data structure. param2 has no meaning and should be set to NULL.

typedef struct _ENC_PARAM_ {
	int x_dim;		     // the x dimension of the frames to be encoded
	int y_dim;		     // the y dimension of the frames to be encoded
	float framerate;	     // the frame rate of the sequence to be encoded
	long bitrate;		     // the bitrate of the target encoded stream
	int rc_period;	     // the intended rate control averaging period
	int rc_reaction_period;  // the reaction period for rate control
	int rc_reaction_ratio;   // the ratio for down/up rate control
	int max_quantizer; 	     // the upper limit of the quantizer
	int min_quantizer; 	     // the lower limit of the quantizer
       int max_key_interval;    // the maximum interval between key frames
       int use_bidirect;	     // use bidirectional coding
	int deinterlace;	     // fast deinterlace
	int quality;		     // the quality of compression ( 1 - fastest, 5 - best )
	int obmc;		     // flag to enable overlapped block motion compensation mode
	void *handle;		     // the empty handle, which will be filled by encore
} ENC_PARAM;

Encore returns ENC_OK if  the encore process is correct. Encore returns ENC_MEMORY if there is a memory allocation error. 

At a minimum the parameters x_dim and y_dim must be initialized (valid range: 0<x_dim<=1920, 0<y_dim<=1280, both dimensions should be even).  The other parameters can be set to 0, in which case they'll be initialized to default values, or can be specified directly.

The rc_period is the averaging period for the rate control algorithm.  Basically, how fast the RC algorithm forgets the  history of the rate control.  A larger rc_period value usually results in more accurate overall rate. However, it should not be too large compared to the length of the sequence. A common value used is 2000.

The rc_reaction_period controls how fast the RC adapts to recent scenes. A larger rc_reaction_period value usually results in better high motion and worse low motion scene. A common value used is 10.

The rc_reaction_ratio controls the relative sentivity in reaction to high or low motion scenes. A larger rc_reaction_ratio value uaually results in better high motion scene but larger bit consumption. 
A common value used is 20.

The max_key_interval sets the maximum interval between the key (INTRA) frames.  In one-pass mode, the key frame is automatically inserted during encoding when the codec detects a scene change. If there are no scene changes for a long period of time a key frame will be inserted to insure that the interval will always be less than or equal than the set maximum key frame interval. 

The use_bidirect option is reserved for future implementation and currently is not supported. It should be set to zero.

The deinterlace option is currently ignored by x86 non-MMX machines.

The quality parameter determines the motion estimation algorithm encore will perform on the input frames.  For the higher quality settings, a more thorough motion search will be performed.  This will usually result in a better match of the blocks, and hence fewer bits needed for coding the residue texture errors. In other words, the quality of the decoded video will be better for the same resulting bitrate.

The obmc option is reserved for future implementation and currently is not supported. It should be set to zero.

5.2. Encore - Encoding

The encore encoding process starts when ENC_OPT_ENCODE or ENC_OPT_ENCODE_VBR is set at enc_opt. Encore will encode the input video frame using the coding parameter and reference frame associated with the handle.

When ENC_OPT_ENCODE is set, encore will analyze the input frame and automatically detect the scene changes. The quant and intra instruction inputs (see below) will be ignored. 

When ENC_OPT_ENCODE_VBR is set, encore will encode the input frame following the quant and intra instructions in the input.

In this operation, the calling thread needs to provide param1 and param2 pointers to the following data struture. 

typedef struct _ENC_FRAME_ {
	void *image;		// the image frame to be encoded
	void *bitstream;	// the buffer for encoded bitstream
	int length;		// the length of the encoded bitstream
	int colorspace;	// the format of image frame
	int quant;		// quantizer for this frame; only used in VBR modes
	int intra;		// force this frame to be intra/inter; only used in VBR 2-pass
void *mvs;		// optional pointer to array of motion vectors
} ENC_FRAME;

The image points to the input bitmap. The bitstream points to a buffer large enough to hold the output MPEG-4 bitstream. Checks for buffer overflow are too expensive and it will be almost impossible to recover from such an overflow. Thus, no checks for buffer overflow will be done. The theoretical upper limit of the frame size is around 6 bytes/pixel or 2.5 Mb for a 720x576 frame. On success, encore will also set length to indicate how many bytes are written into the bitstream buffer. 

The colorspace indicates the color space the input image is in. The value of colorspace must be one of the following.

#define ENC_CSP_RGB24    0	   // common 24-bit RGB, ordered as b-g-r
#define ENC_CSP_YV12     1	   // planar YUV, U & V subsampled by 2 in both directions, 
    average 12 bit per pixel; order of components y-v-u  
#define ENC_CSP_YUY2     2    // packed YUV, U and V subsampled by 2 horizontally,
    average 16 bit per pixel; order of components y-u-y-v
#define ENC_CSP_UYVY     3	   // same as above, but order of components is u-y-v-y
#define ENC_CSP_I420     4    // same as ENC_CSP_YV12, but chroma components are  
    swapped (order y-u-v)
#define ENC_CSP_IYUV     ENC_CSP_I420

The encoder is most effective in modes ENC_CSP_I420 and ENC_CSP_YV12. Conversion from mode ENC_CSP_UYVY is currently not optimized.


When encoding is performed with ENC_OPT_ENCODE,  quant and intra fields of ENC_FRAME structure are ignored. Encore2 provides a possibility to more accurately control the encoding process. To use this feature, you have to pass ENC_OPT_ENCORE_VBR as an argument of encore(). In this case,  the quant instructs encore to encode the current frame with the specified quantizer. The valid ranges of this field is 1 to 31, with 1 giving highest quality and 31 giving lowest bitstream size. The Intra forces the current frame to be encoded as INTRA frame (when Intra = 1) and INTER frame (when Intra = 0).  When Intra is set to -1, the internal decision method is adopted.



The ENC_RESULT parameter is used for the encore to return some results of the encoding operation.

typedef struct _ENC_RESULT_ {
	int isKeyFrame; 		// the current frame is encoded as a key frame
	int quantizer;		// the quantizer used to encode the current frame
	int texture_bits;		// the number of bits used for texture coding
	int motion_bits;		// the number of bits used for motion vectors
	int total_bits;		// the total number of bits used for the current frame
} ENC_RESULT;

The isKeyFrame variable is set to 1 if the current frame is encoded as a key frame, otherwise it is set to 0.

Encore returns ENC_OK if the encore process is correct. Encore returns ENC_BAD_FORMAT if the input frame format does not match the format set at initialization with the current handle. 
5.3. Encore - Release

The encore releasing process starts when ENC_OPT_RELEASE is set at enc_opt. Encore will purge all the information and release all the resources allocated for handle, and delete handle from its database.

When ENC_OPT_RELEASE is set, both param1 and param2 have no meaning and should be set to NULL.

Encore returns ENC_OK.

5.4. Encore - Example


	/** Create encoder **/
	ENC_PARAM param;
	memset(&param, 0, sizeof(param));
	param.x_dim=640;
	param.y_dim=480;
	param.framerate=25.;
	param.min_quantizer=4;
	param.max_quantizer=4;
	param.quality=1;
	encore(0, ENC_OPT_INIT, &param, 0);
	void* handle=param.handle;

	int progress=0;

	ENC_FRAME fr;
	fr.colorspace=ENC_CSP_YUY2;
	fr.mvs=0;

	/** Encode 100 frames and calculate total size **/
	for(int i=0; i<100; i++)
	{
	    fr.image=m_pInput[i];
	    fr.bitstream=m_pOutput;

	    ENC_RESULT res;
	    encore(handle, ENC_OPT_ENCODE, &fr, &res);

	    printf("Frame %d encoded as %s into %d bytes\n",
	        i, res.is_key_frame?"INTRA":"INTER", fr.length);
	    progress+=fr.length;
	}

	printf("100 frames encoded into %d bytes\n", progress);
	/** Release the encoder **/
	encore(handle, ENC_OPT_RELEASE, 0, 0);

6. Decore2 Operation

Decore has the following decoding options.

// decore options
#define DEC_OPT_MEMORY_REQS	0
#define DEC_OPT_INIT		1
#define DEC_OPT_RELEASE		2
#define DEC_OPT_SETPP		3 // set postprocessing mode
#define DEC_OPT_SETOUT		4 // set output mode
#define DEC_OPT_FRAME		5
#define DEC_OPT_FRAME_311		6

Decore can return the following values:

// decore return values
#define DEC_OK		0
#define DEC_MEMORY		1
#define DEC_BAD_FORMAT	2
#define DEC_EXIT		3
6.1. Decore - Initialization

Before initializing decore(), the application should allocate the data structures. To do this, the application will call decore() setting DEC_OPT_MEMORY_REQS as dec_opt. When DEC_OPT_MEMORY_REQS is set, the calling thread needs to provide the input parameter (param1) pointing to a DEC_PARAM data structure and the output parameter (param2) pointing to a DEC_MEM_REQS structure. The two data structures mentioned are defined below.
 
typedef struct 
{
    int x_dim; 		// x dimension of the frames to be decoded
    int y_dim; 		// y dimension of the frames to be decoded
    int output_format;	// output color format
    int time_incr;
    DEC_BUFFERS buffers;
} DEC_PARAM;

typedef struct
{
    unsigned long mp4_edged_ref_buffers_size;
    unsigned long mp4_edged_for_buffers_size;
    unsigned long mp4_display_buffers_size;
    unsigned long mp4_state_size;
    unsigned long mp4_tables_size;
    unsigned long mp4_stream_size;
    unsigned long mp4_reference_size;
} DEC_MEM_REQS;

The application must provide a meaningful param1 containing the correct dimension of the frames. Decore will give back the application information on the buffers.  Data structure size are needed througout param2.

At this point the application should allocate the required memory and then initialize decore.
The memory that decore will use is described in the DEC_BUFFERS structure, defined as:

typedef struct
{
    void * mp4_edged_ref_buffers;  
    void * mp4_edged_for_buffers;  
    void * mp4_edged_back_buffers;
    void * mp4_display_buffers;
    void * mp4_state;
    void * mp4_tables;
    void * mp4_stream;
    void * mp4_reference;
} DEC_BUFFERS;

The time_incr field in DEC_PARAM indicates the number of evenly spaced ticks within one modulo time. This information is usually present in the VOL header of each stream (each stream contains one instance of the VOL). A default value for time_incr should always be indicated.

The decore initialization process starts when DEC_OPT_INIT is set at dec_opt. The decore will initialize a new instance associated with handle.

When DEC_OPT_INIT is set, the calling thread needs to provide param1 pointers to a valid DEC_PARAM data structure. In this case, param2 has no meaning and should be set to NULL.

Decore returns DEC_OK if if the process is correct. Decore returns DEC_MEMORY if there is a memory allocation error.

The output value must be choosen from the following valid formats:

// supported output formats
#define DEC_YUY2		1
#define DEC_YUV2 		DEC_YUY2	
#define DEC_UYVY		2
#define DEC_420		3
#define DEC_RGB32		4 
#define DEC_RGB24		5 
#define DEC_RGB555		6 
#define DEC_RGB565		7	
#define DEC_RGB32_INV	8
#define DEC_RGB24_INV	9
#define DEC_RGB555_INV 	10
#define DEC_RGB565_INV 	11
#define DEC_USER 		12

DEC_YUY2 and DEC_UYVY are packed YUV formats. DEC_420 is planar YUV with chrominance planes subsampled by 2 in both directions. DEC_RGB* formats correspondto different flavors of RGB with or without vertical flipping of the output.

The last format (DEC_USER) provides the user the ability to manually perform colorspace conversion with optimal efficiency. See the next paragraph for more details.
6.2. Decore  - Decoding

The decore decoding process starts when DEC_OPT_FRAME decoding option is set at dec_opt. Decore will decode the input bitstream using the coding parameter and the reference frame associated with the handle. 

In this operation, the caller needs to provide param1 pointing to the following data structure. 
typedef struct 
{
	void *bmp;		// the 24-bit bitmap to be encoded
	void *bitstream;	// the buffer for encoded bitstream
	long length;		// the length of the encoded bitstream
	int render_flag;  	// 1: render the bitmap, 0: avoid to render the output 
} DEC_FRAME;

The bitstream points to a buffer holding the output MPEG-4 bitstream. The length indicates how many bytes the bitstream buffer holds  in the actual bitstream. The bmp points to a bitmap to hold the output image. The render_flag is used to speed up the decoder if it is in late. In particular, if render_flag is 0, the decoder will not produce a valid bmp. The renderer should avoid as a consequence to display the returned bmp.

In a special case of color output format DEC_USER, bmp pointer is treated as a pointer to the structure in the following format:

typedef struct
{
void *y;
	void *u;
	void *v;
	int stride_y;
	int stride_uv;
} DEC_PICTURE;

Its members will be filled after successful decompression of the output. Fields y, u and v will contain pointers to internal decoder memory buffers, and fields stride_y and stride_uv will contain strides of these buffers (distances in bytes between sequential scanlines). The caller will need to perform clipping and color space conversion by himself. 

Warning: these pointers may be valid only until the next call to decore(). Strides may be larger from dimensions of image. Avoid using return values if you passed render_flag=0.

Decore returns DEC_OK if the process works correctly. Decore returns DEC_BAD_FORMAT if the input bitstream does not match the format set at initialization with the current handle.

param2 can be set to NULL or to pointer to the following structure:

typedef struct
{
	int intra;
	int *quant_store;
	int quant_stride;
} DEC_FRAME_INFO;

In latter case the structure will be filled with information about the decoded frame which can be used, for example, for further image processing. Intra field will receive zero if the frame was inter ( delta-frame ) and nonzero if the frame was intra ( key-frame ). Quant_store will receive a pointer to the quantizer array and its stride. The array has one entry for each 16x16 pixel block, and quant_stride is the difference between subsequent rows ( for example, quant_store[0] is quantizer for  the top left macroblock and quant_store[quant_stride] is quantizer for the macroblock just below that ).
6.3. Decore - Release

The decore release process starts when DEC_OPT_RELEASE is set at dec_opt. Decore will delete handle from its database. After realeasing decore the application should free the allocated data structures and buffers.

When DEC_OPT_RELEASE is set, both param1 and param2 have no meaning and should be set to NULL.

Decore returns DEC_OK.

6.4. Decore - Decoding DivX ;-) 3.11

The current release of decore is compatible with DivX ;-) 3.11 ( also known as MS MPEG-4 v3 ) video format. Decoding of this format is done the same way as of DivX 4.0, but you need to pass DEC_OPT_FRAME_311 option as the second argument of decore().
6.5. Decore - Extra Settings

It is possible to change the output format or the postprocessing level of the decoder using the decoding options DEC_OPT_SETOUT or DEC_OPT_SETPP.
In particular:

To change the output format of the decoder, the user must set the DEC_PARAM structure according to the output format he wants the decoder to write in bmp, then call decore passing as param1 the DEC_PARAM structure and setting DEC_OPT_SETOUT as decoder option. Here's an example:

{
   DEC_PARAM DecParam;
   DecParam.color_depth = NULL;
   DecParam.output_format = DEC_RGB32;

  decore((long)this, DEC_OPT_SETOUT, &DecParam, NULL);  // tell decore to output in RGB32 mode

  ...
}

To change the postprocessing level of the decoder the user must set the DEC_SET structure in accordance with the postprocessing level desired and call the decore API passing as param1 the DEC_SET structure and setting DEC_OPT_SETPP as decoder option. 

The DEC_SET structure is defined as:

typedef struct _DEC_SET_
{
   int postproc_level; // valid interval are [0..100]
} DEC_SET;

Note that the valid value for postproc_level are integer numbers between 0 and 100. 
Here's a code example:

{
   DEC_SET dec_set;
   dec_set.postproc_level = m_iPPLevel;

   decore((long)this, DEC_OPT_SETPP, &dec_set, NULL);
}

Decore is also able to adjust gamma 
of output pictures at run-time ( note: this feature is not available in DEC_USER mode ).
Use DEC_OPT_GAMMA decoding option in the following way:

decore((long)this, DEC_OPT_GAMMA, (void*)mode, (void*)value);
    
Here 'mode' must be one of DEC_GAMMA_CONTRAST, DEC_GAMMA_BRIGHTNESS and DEC_GAMMA_SATURATION;
'value' can be in range of -128 ... 127.
    
6.6. Decore - Example

This is a simple example on how to initialize and release decore():


	...

	DEC_MEM_REQS decMemReqs;
	DEC_PARAM decParam;

	decParam.x_dim = m_FrameWidth;
	decParam.y_dim = m_FrameHeight;
	decParam.output_format = m_OutputFormat;
	decParam.time_incr = 15; // time_incr default value
	
	decore((long) this, DEC_OPT_MEMORY_REQS, &decParam, &decMemReqs);

	// the application allocates the data structures and the buffers
	decParam.buffers.mp4_edged_ref_buffers = malloc(decMemReqs.mp4_edged_ref_buffers_size);
	decParam.buffers.mp4_edged_for_buffers = malloc(decMemReqs.mp4_edged_for_buffers_size);
	decParam.buffers.mp4_edged_back_buffers = malloc(decMemReqs.mp4_edged_back_buffers_size);
	decParam.buffers.mp4_display_buffers = malloc(decMemReqs.mp4_display_buffers_size);
	decParam.buffers.mp4_state = malloc(decMemReqs.mp4_state_size);
	decParam.buffers.mp4_tables = malloc(decMemReqs.mp4_tables_size);
	decParam.buffers.mp4_stream = malloc(decMemReqs.mp4_stream_size);
	decParam.buffers.mp4_reference = malloc(decMemReqs.mp4_reference_size);

	memset(decParam.buffers.mp4_state, 0, decMemReqs.mp4_state_size);
	memset(decParam.buffers.mp4_tables, 0, decMemReqs.mp4_tables_size);
	memset(decParam.buffers.mp4_stream, 0, decMemReqs.mp4_stream_size);
	memset(decParam.buffers.mp4_reference, 0, decMemReqs.mp4_reference_size);

	decore((long) this, DEC_OPT_INIT, &decParam, NULL);

	// decode frames
	{
		DEC_FRAME decFrame;

		decFrame.bitstream = m_InputStream;
		decFrame.stride=m_Stride;
		decFrame.bmp = m_OutputBmp;
		decFrame.length = m_StreamLength;
		decFrame.render_flag = 1;

		while ( decore((long) this, DEC_OPT_FRAME, &decFrame, NULL) == DEC_OK )
			;
	}

	decore((long) this, DEC_OPT_RELEASE, NULL, NULL);

	free(m_decParam.buffers.mp4_display_buffers);
	free(m_decParam.buffers.mp4_edged_for_buffers);
	free(m_decParam.buffers.mp4_edged_back_buffers);
	free(m_decParam.buffers.mp4_edged_ref_buffers);
	free(m_decParam.buffers.mp4_reference);
	free(m_decParam.buffers.mp4_state);
	free(m_decParam.buffers.mp4_stream);
	free(m_decParam.buffers.mp4_tables);



DivXNetworks, Inc.
DivX MPEG-4 Codec and Its Interface



DivX MPEG-4 Codec and Its Interface
DivXNetworks, Inc. Proprietary and Confidential
 
Page 14 of 1


